美化你的单细胞各个亚群特异性高表达基因热图
单细胞数据分析里面最基础的就是降维聚类分群,参考前面的例子:人人都能学会的单细胞聚类分群注释 ,这个大家基本上问题不大了,使用seurat标准流程即可,不过它默认出图并不好看,详见以前我们做的投票:可视化单细胞亚群的标记基因的5个方法,下面的5个基础函数相信大家都是已经烂熟于心了:
VlnPlot(pbmc, features = c("MS4A1", "CD79A")) FeaturePlot(pbmc, features = c("MS4A1", "CD79A")) RidgePlot(pbmc, features = c("MS4A1", "CD79A"), ncol = 1) DotPlot(pbmc, features = unique(features)) + RotatedAxis() DoHeatmap(subset(pbmc, downsample = 100), features = features, size = 3)
最近,郑州大学第一附属医院的史阳同学无私的分享了他对这些基础函数的改造,颜值说不上巅峰,但打败基础函数是没有问题的, 同时也算是抛砖引玉吧,希望广大生信技能树粉丝们都投稿分享自己的创作,投稿请发邮件到 jmzeng1314@163.com
仍然是以大家熟知的pbmc3k数据集为例
大家先安装这个数据集对应的包,并且对它进行降维聚类分群,参考前面的例子:人人都能学会的单细胞聚类分群注释 ,而且每个亚群找高表达量基因,都存储为Rdata文件。
# 0.安装R包 ----
# devtools::install_github('caleblareau/BuenColors')
# utils::install.packages(pkgs = "ggstatsplot")
# InstallData("pbmc3k")
# 1.加载R包和测试数据 ----
rm(list = ls())
library(SeuratData) #加载seurat数据集
getOption('timeout')
options(timeout = 10000)
data("pbmc3k")
sce <- pbmc3k.final
library(Seurat)
DimPlot(sce,reduction = "umap",label = TRUE)
unique(Idents(sce))
sce$celltype = Idents(sce)
# 2.单细胞分析基本流程 ----
# 主要是拿到 tSNE和Umap的坐标,因为默认的 pbmc3k.final 是没有的
sce <- NormalizeData(sce, normalization.method = "LogNormalize",
scale.factor = 1e4)
sce <- FindVariableFeatures(sce,
selection.method = "vst", nfeatures = 2000)
sce <- ScaleData(sce)
sce <- RunPCA(object = sce, pc.genes = VariableFeatures(sce))
sce <- FindNeighbors(sce, dims = 1:15)
sce <- FindClusters(sce, resolution = 0.8)
set.seed(123)
sce <- RunTSNE(object = sce, dims = 1:15, do.fast = TRUE)
sce <- RunUMAP(object = sce, dims = 1:15, do.fast = TRUE)
DimPlot(object = sce, reduction = "umap",label = TRUE)
# 3.定义分组信息 ----
Idents(sce) = sce$celltype
# 4.寻找各个单细胞亚群的特征性高表达量基因----
sce.markers <- FindAllMarkers(object = sce,
only.pos = TRUE,
min.pct = 0.25,
thresh.use = 0.25)
save(sce, sce.markers, file = 'tmp.Rdata')
接下来改造 函数
先放一下改造效果图:
可以看到如下所示的的自定义基因列表,并没有如上所示的算法挑选的top2基因效果好:
代码很简单;
source('DoHeatmapPlot.R')
library(patchwork)
cg <- c("CD4","CD3E","IL7R", "KLF2", "CCR7","TCF7",
"SELL", "CCL4", "CCL5", "PRF1", "GZMB",
"GZMK", "FGFBP2", "CX3CR1", "RORC",
"CXCR5", "FOXP3", "GATA3", "PTGDR2")
marker_selected_1 <- sce.markers %>%
dplyr::filter(p_val_adj < 0.05) %>%
dplyr::filter(pct.1 >= 0.5 & pct.2 <= 0.6) %>%
dplyr::group_by(cluster) %>%
dplyr::slice_max(order_by = avg_log2FC, n = 2)
DoHeatmapPlot(object = sce, groupBy = 'celltype', features = marker_selected_1$gene)
DoHeatmapPlot(object = sce, groupBy = 'celltype', features = cg)
所有的细节都在 DoHeatmapPlot.R 文件里面的 DoHeatmapPlot 函数, 是自定义的,内容如下所示:
DoHeatmapPlot <- function(object, groupBy, features) {
require(ComplexHeatmap)
# (1)获取绘图数据
plot_data = SeuratObject::FetchData(object = object,
vars = c(features, groupBy),
slot = 'counts') %>%
dplyr::mutate(across(.cols = where(is.numeric), .fns = ~ log2(.x + 1))) %>%
dplyr::rename(group = as.name(groupBy)) %>%
dplyr::arrange(group) %T>%
assign(x = 'clusterInfo', value = .$group, envir = .GlobalEnv) %>%
dplyr::select(-group) %>%
t()
# (2)配色方案:
require(BuenColors)
col = jdb_color_maps[1:length(unique(clusterInfo))]
names(col) = as.character(unique(clusterInfo))
# (3)列(聚类)注释:
# HeatmapAnnotation函数增加聚类注释,如条形图,点图,折线图,箱线图,密度图:
top_anno = HeatmapAnnotation(
cluster = anno_block(gp = gpar(fill = col), #设置填充色;
labels = as.character(unique(clusterInfo)),
labels_gp = gpar(cex = 0.5,
col = 'white',
family = 'Arial'))) #设置字体;
# (4)行注释:
# rowAnnotation和anno_mark突出重点基因:
plot_data = as.data.frame(plot_data)
gene_pos = which(rownames(plot_data) %in% features)
#which和%in%联用返回目标向量位置;
plot_data = as.matrix(plot_data)
row_anno = rowAnnotation(mark_gene = anno_mark(at = gene_pos,
labels = features))
# (5)绘图:
Heatmap(matrix = plot_data,
cluster_rows = FALSE,
cluster_columns = FALSE,
show_column_names = FALSE,
show_row_names = FALSE,
show_heatmap_legend = TRUE,
column_split = clusterInfo,
top_annotation = top_anno, #在热图上边增加注释;
column_title = NULL,
right_annotation = row_anno,
use_raster = FALSE,
heatmap_legend_param = list(
title = 'log2(count+1)',
title_position = 'leftcenter-rot'))
}
主要是深度定制了 ComplexHeatmap 包,懂得小伙伴应该是可以做出来更好的版本。目前看起来比原图稍微好一点点,原图如下所示:
这样的基础认知,也可以看基础10讲:
01. 上游分析流程 02.课题多少个样品,测序数据量如何 03. 过滤不合格细胞和基因(数据质控很重要) 04. 过滤线粒体核糖体基因 05. 去除细胞效应和基因效应 06.单细胞转录组数据的降维聚类分群 07.单细胞转录组数据处理之细胞亚群注释 08.把拿到的亚群进行更细致的分群 09.单细胞转录组数据处理之细胞亚群比例比较
最基础的往往是降维聚类分群,参考前面的例子:人人都能学会的单细胞聚类分群注释
写在文末
我在《生信技能树》,《生信菜鸟团》,《单细胞天地》的大量推文教程里面共享的代码都是复制粘贴即可使用的, 有任何疑问欢迎留言讨论,也可以发邮件给我,详细描述你遇到的困难的前因后果给我,我的邮箱地址是 jmzeng1314@163.com
如果你确实觉得我的教程对你的科研课题有帮助,让你茅塞顿开,或者说你的课题大量使用我的技能,烦请日后在发表自己的成果的时候,加上一个简短的致谢,如下所示:
We thank Dr.Jianming Zeng(University of Macau), and all the members of his bioinformatics team, biotrainee, for generously sharing their experience and codes.
十年后我环游世界各地的高校以及科研院所(当然包括中国大陆)的时候,如果有这样的情谊,我会优先见你。